Suffix Tree of Alignment: An Efficient Index for Similar Data

نویسندگان

Joong Chae Na

Heejin Park

Maxime Crochemore

Jan Holub

Costas S. Iliopoulos

Laurent Mouchard

Kunsoo Park

چکیده

We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings A and B is a compacted trie representing all suffixes in A and B. It has |A|+ |B| leaves and can be constructed in O(|A|+ |B|) time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not exploit the similarity which is usually represented as an alignment of A and B. In this paper we propose a space/time-efficient suffix tree of alignment which wisely exploits the similarity in an alignment. Our suffix tree for an alignment of A and B has |A|+ ld+ l1 leaves where ld is the sum of the lengths of all parts of B different from A and l1 is the sum of the lengths of some common parts of A and B. We did not compromise the pattern search to reduce the space. Our suffix tree can be searched for a pattern P in O(|P | + occ) time where occ is the number of occurrences of P in A and B. We also present an efficient algorithm to construct the suffix tree of alignment. When the suffix tree is constructed from scratch, the algorithm requires O(|A| + ld + l1 + l2) time where l2 is the sum of the lengths of other common substrings of A and B. When the suffix tree of A is already given, it requires O(ld + l1 + l2) time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Suffix Array of Alignment: A Practical Index for Similar Data

The suffix tree of alignment is an index data structure for similar strings. Given an alignment of similar strings, it stores all suffixes of the alignment, called alignment-suffixes. An alignment-suffix represents one suffix of a string or suffixes of multiple strings starting at the same position in the alignment. The suffix tree of alignment makes good use of similarity in strings theoretica...

متن کامل

Phrase Based Document Retrieving by Combining Suffix Tree index data structure and Boyer- Moore faster string searching algorithm

Phrase has been considered as a more informative feature term for improving the effectiveness of document retrieval .This paper propose an Algorithm A Phrase Based Document Retrieval to retrieve the similar documents by combining two exiting algorithm suffix tree ,index data structure and “The Boyer-Moore Algorithm”, faster string searching algorithm. The suffix tree is constructed based on E. ...

متن کامل

A Partition-Based Suffix Tree Construction and Its Applications

A suffix tree (also called suffix trie, PAT tree or, position tree) is a powerful data structure that presents the suffixes of a given string in a way that allows a fast implementation of important string operations. The idea behind suffix trees is to assign to each symbol of a string an index corresponding to its position in the string. The first symbol in the string will have the index 1, the...

متن کامل

Implementation and performance analysis of efficient index structures for DNA search algorithms in parallel platforms

Because of the large datasets that are usually involved in deoxyribonucleic acid (DNA) sequence alignment, the use of optimal local alignment algorithms (e.g., Smith–Waterman) is often unfeasible in practical applications. As such, more efficient solutions that rely on indexed search procedures are often preferred to significantly reduce the time to obtain such alignments. Some data structures ...

متن کامل

An efficient approach for sequence matching in large DNA databases

In molecular biology, DNA sequence matching is one of the most crucial operations. Since DNA databases contain a huge volume of sequences, fast indexes are essential for efficient processing of DNA sequence matching. In this paper, we first point out the problems of the suffix tree, an index structure widely-used for DNA sequence matching, in respect of storage overhead, search performance, and...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Suffix Tree of Alignment: An Efficient Index for Similar Data

نویسندگان

چکیده

منابع مشابه

Suffix Array of Alignment: A Practical Index for Similar Data

Phrase Based Document Retrieving by Combining Suffix Tree index data structure and Boyer- Moore faster string searching algorithm

A Partition-Based Suffix Tree Construction and Its Applications

Implementation and performance analysis of efficient index structures for DNA search algorithms in parallel platforms

An efficient approach for sequence matching in large DNA databases

عنوان ژورنال:

اشتراک گذاری